Telematics data provides information about vehicle location, speed,
pickup and drop off locations, acceleration, braking, time of trip and
more. Given that the resources on telematics data is quite limited, the
first part of this capstone project aims at understanding and
visualizing such data. Building off of this part, we then pair
telematics data with census data to identify patterns and contextualize
the results, in this case looking at medium income in Ohio in the
covid-19 era. With the spread of covid-19 in 2020 and the Ohio governor
issuing a stay at home mandate, people had to transition to working at
home. This resulted in a lot of people losing their jobs, especially low
income families. As for telematics data, we wanted to see how the
pandemic impacted rideshare usage. We layered American Community Survey
(ACS) median income level data with pickup locations to see trends. We
found that the number of trips in 2021 increased significantly, with
lower income communities not using rideshare apps.
Keywords: Telematics Data, Census data, Visualizations,
Ride-share apps
The problem this capstone project aims to address is two-fold. The first part is to document how to use and visualize telematics data and the second part is how to pair telematics data with census data in order to find relevant patterns. During our first meeting, the ask from our sponsor, 99P Labs, was to analyze and visualize a dataset of telematics data they provided us with. The telematics dataset was collected from a rideshare service and contained tables on pickup and dropoff locations and times, vehicle routes, safety metrics, types of customers, and more. The initial goal of this project was to find patterns in the data and create a story to interpret these patterns efficiently. Therefore, we were tasked with each doing our own individual exploratory data analysis to come up with a diverse range of potential patterns. However, we quickly realized that even before getting to the analysis portion of the project, we needed to understand what the telematics data given means. The data dictionary provided was not useful in elaborating on what the tables and variables of the telematics data encompass. External sources were not as helpful either to help us load and visualize the data. This brings us to the first problem we set out to solve which is documenting what is telematics data and how to visualize telematics data.
The second part of this project was to meet our sponsors’ request of creating a story with the telematics data. Given that the time frame of the telematics data is 2020-2023, we thought it would be interesting to see how Covid-19 affected the distribution of pick-up locations. Moreover, we decided to pair the telematics data with census data to identify and investigate any trends that pop up. We decided to use census data on median income to first calculate the percentage change in median household income from 2020 to 2021 and then overlay the pickup locations from 2020 and 2021 to pinpoint any relative patterns. Such a story can help in establishing whether there is a possible correlation between median household income and rideshare trips.
On a broader scale, telematics data can help identify the potential ways in which not only vehicles can operate more efficiently, but also transport systems. Reducing traffic congestion as well as fuel consumption are just a few examples of putting telematics data to good use. Moreover, when looking at the current landscape when it comes to the automotive industry, the green energy transition and AI seem to be at the forefront of innovation. Using telematics data can further these innovations to create better sustainable transportation systems. It can also act as a primary source of data to inform, structure, and build new AI models to make traveling safer and faster. Bottom line is that as with any other data sources, telematics data can be used to identify areas of improvement if analyzed properly.
In 2017, Uber launched the Movement website that shared the data Uber gathered for 100,000 cities across the world. The website aimed for promoting better urban planning by offering data-driven insights into commuting. It offered specific information about travels including travel conditions across different times of day, days of the week, or months of the year—and how travel times are impacted by big events, road closures or other things happening in a city. As a result, the Uber Movement benefited urban planners in better evaluating which parts of the city need to be expanded and better managing infrastructure. Given that it was open to the public, the website was also helpful for commuters to plan their trips and better respond to emergencies in transportation systems. However, as of October 1, 2023, the website is no longer available for unknown reasons.
Past research has looked into the impact of pandemic on shared mobility systems (Menon et al. 2020) . It was found that several shared mobility systems have been negatively impacted by the pandemic as they’re perceived by some as “unsafe” due to challenges with social distancing, including buses and taxis. While the crisis has led to a 50 to 90 percent decline in transit ridership in major metropolitan areas based on reports from transportation apps, it was anticipated that low-income households would likely switch to public transportation during the pandemic restrictions.
There is a lack of documentation of the telematics dataset we were given, such that the existing data dictionary did not fully clarify what certain variables mean and how they relate to each other across datasets. It necessitates our effort in figuring out what the data exactly means and how to best visualize them (e.g., visualizing real-time location data as the trajectory of a trip).
While past research, such as the Uber Movement, has presented very detailed and comprehensive pictures of transportation and traveling, they did not focus on making connections with other aspects of people’s lives. It is important, however, to understand the data within a broader socioeconomic context. We believe that household income is an important predictor for trip making and activity engagement, as also supported by past studies that predicted that low-income households will likely switch to public transportation more than private transportation during the pandemic (Taylor and Wasserman 2020).
Therefore, we aim to contextualize rideshare trip information by looking at trip patterns in relation to change in median income during the pandemic at county level. In order to visualize the patterns, we plan to overlay census data on the trip request telematics data, specifically the pick-up locations.
In summary, our project contributes to the existing literature and resources in two main ways, each relating to the first and second parts of this project respectively. The first contribution this paper makes is acting as a resource which people can use to learn about telematics data and how to load, visualize, and interpret this data all in one document. We have documented the helpful tips and information that helped us navigate such data so that others can easily avoid running into the same problems that we encountered. The second contribution that our paper makes is giving insights into the potential relationship between percentage change in median household income and rideshare trip patterns from 2020-2021. While such changes might be attributed to the Covid-19 pandemic, it is still crucial to understand how both household median income and rideshare trips distribution changed relative to the pandemic and to each other as well.
Our dataset was provided by 99P Labs. For our project, we focused on two specific tables out of the eight made available: TRIP_REQUEST_202308141518.csv and VEHICLE_LOCATION_202308141525.csv. The VEHICLE_LOCATION was our largest dataset, around 150 MB.
The TRIP_REQUEST table comprised 17 variables, with our analysis concentrating on two key variables: PICKUP_ADDRESS and DROPOFF_ADDRESS. Utilizing the tidygeocoder library, we transformed these addresses into coordinates. In the context of Telematics Data 101, we subsequently mapped the pickup and drop-off locations using these coordinates.
Turning to the VEHICLE_LOCATION table, it featured a total of 5 variables. For our project, we specifically utilized VEHICLE_ID, EVENT_TIMESTAMP, LAT, and LNG, deeming SPEED_MPH unnecessary. Within the Telematics Data 101 framework, we mapped the route of a single vehicle on October 3rd, 2021. This decision was motivated by the sheer size of the dataset, making it impractical to map more than one vehicle effectively.”
For part one of our project, the main tasks were loading in the telematics data into RStudio, geocoding the addresses, and creating data visualizations of the data. While loading our data into R, we faced multiple difficulties. Our dataset included a total of 8 tables. Some were small, while others were too large. The largest table containing information about trip requests was around 150 MB. This data set took around 30 minutes to load into our local computers. We attempted to upload our data into our GitHub repository, but it turned out we cannot upload any files larger than 25 MB. To resolve this issue, we put these files into our gitignore and moved our directory to the RStudio server which allowed us to load in larger datasets more efficiently than our local computers. Next, we geocoded the pickup and drop off addresses in the trip request dataset into their respective latitudes and longitudes. This would allow us to use the mapping package Leaflet to create interactive data visualizations.
In our code, we first used method = osm. Using this, it took us around two hours to geocode all of the addresses in the dataset. In addition, our laptops had to be open and connected to wifi. Otherwise, the code fails and we would have to restart it. Once we changed our method to census and we started using the rstudio2 server, our code ran in less than 10 minutes and we had our addresses geocoded. Additionally, using the ‘census’ method allowed us to geocode the locations without acquiring a specific API key, which other methods required. We saved all the latitude and longitude addresses into a datafile (trip_data_full), which could then be visualized using the leaflet interactive mapping package. When attempting to map all the pickup or drop off locations using leaflet, the code would crash due to the amount of locations. Therefore, we used the clustering functionality in Leaflet, which clusters the markers and shows the number of items in each cluster and as one zooms in, the clusters are adjusted based on the current view. The visualizations of the pickup requests can be seen in Figure 1. Along with working with the trip request dataset, we were also interested in the vehicle location dataset. As seen in Figure 2, we created a visualization of the routes taken on a particular day for one particular vehicle, which required filtering out a specific date and vehicle id. Because the location of the vehicle is reported as a data point throughout the entire ride, we were able to plot these data points to display the entire route of a vehicle. Finally, we used the census data to visualize the median income per county and used color palettes to show the difference in income levels.
For part two of our project, we were interested in exploring the relationship between areas of concentration of pickup locations and the changes in median household income. To answer these questions, we paired the American Community Survey (ACS) data as well as the trip request table from the telematics data 99P Labs provided to us. ACS is a project of the U.S. Census Bureau that gathers data annually regarding information about American population and housing characteristics. To help answer our questions, we focused on the median household income in each county in Ohio in 2020 and 2021. We mapped the percent difference in median income between 2020 and 2021 and used color grading to define the magnitude and direction of the percent differences. We then paired the ACS Data with the telematics data, specifically the pickup locations in 2020 and 2021. Clustering the locations in the telematics data highlighted the areas of concentrated rideshare requests. By pairing the ACS and telematics data, we hoped to find relationships between the concentrations of pickup locations and changes in median household income.
For the first part of our project, we came up with a document titled, Telematics Data 101, where we gave an overview of what telematics data is, how to load telematics data and the challenges that come with doing so, how to geocode telematics data, how to download census data so that it can potentially be paired with telematics data, and last but not least, how to visualize telematics data. In the Introduction and Motivation section as well as Methods section of this paper, we have discussed what is telematics data and why is it important to study such data, and how to go about facing the challenges in loading and geocoding large datasets of telematics data. All of this work laid the foundations so that we could visualize telematics data in different ways.
A few examples of the data visualizations we came up with are shown and briefly explained below: